A computer readability formula of Japanese texts for machine scoring
نویسندگان
چکیده
A readability formula is obtained that can be used by computer programs for style checking of Japanese texts and need not syntactic or semantic information. The formula is derived as a linear combination of tile surface characteristics of the text that are related to its readability: (1) the average number of characters per sentence, (2) for each type of characters (Roman alphabets, kanzis, hiraganas, katakanas), relative frequencies of rims (maximal swings) that ,:onsists only of that type of characters, (3) the average number of characters per each type of runs, and (4) tooten (comma) to kuten (period) ratio. To find the proper weighting, principal component analysis (PCA) was appliedto these characteristics taken from 77 sample texts. We have found a component which is related to the readability. Its scores match to the empirical knowledges of reading ease. We have also obtained experimental confirmation that the component is an adequate measure for stylistic ease of reading, by the cloze procedure and by the examination on the average lime taken to fill out one blank of the cloze texts.
منابع مشابه
Toward a Readability Index for Japanese Learners of EFL
In our previous research a linear readability formula was developed through a series of multiple regression analyses using four independent variables: (1) sentence length, (2) word length, (3) textbook-based word difficulty and (4) textbook-based idiom difficulty, and one dependent variable: year level of EFL textbook. The present study attempts to develop a new readability formula that include...
متن کاملDo NLP and machine learning improve traditional readability formulas?
Readability formulas are methods used to match texts with the readers’ reading level. Several methodological paradigms have previously been investigated in the field. The most popular paradigm dates several decades back and gave rise to well known readability formulas such as the Flesch formula (among several others). This paper compares this approach (henceforth ”classic”) with an emerging par...
متن کاملAn analysis of a French as a Foreign Language Corpus for Readability Assessment
Readability aims to assess the difficulty of texts based on various linguistic predictors (the lexicon used, the complexity of sentences, the coherence of the text, etc.). It is an active field that has applications in a large number of NLP domains, among which machine translation, text simplification, text summarisation, or CALL (Computer-Assisted Language Learning). For CALL, readability tool...
متن کاملReadability Assessment of Translated Texts
In this paper we investigate how readability varies between texts originally written in English and texts translated into English. For quantification, we analyze several factors that are relevant in assessing readability – shallow, lexical and morpho-syntactic features – and we employ the widely used Flesch-Kincaid formula to measure the variation of the readability level between original Engli...
متن کاملJapanese Controlled Language Rules to Improve Machine Translatability of Municipal Documents
We report on experiments to test the effectiveness of controlled language (CL) rules on texts from Japanese municipal websites. We compiled a set of rules by trial and error, systematically rewriting Japanese source texts and analysing the machine translation (MT) outputs. We then employed native English speakers with little knowledge of Japanese as human evaluators and tested the understandabi...
متن کامل